Experimental: automated, scheduled, dependency free online DDL via gh-ost/pt-online-schema-change#6547
Conversation
|
Zero dependencies doesn't mean zero configuration. What's the throttle replication lag? I'm used to as low as |
|
I'm very excited to see this type of automation. Solving problems everyone deals with in a Vitess-native way goes a long way towards driving mass adoption. At the same time, it's somewhat disappointing that it is using |
👋 it's always best to be explicit. I'm not sure if your impression is that I'm fighting a religious war or am just too obsessed with my own creation 😄 , this isn't the case. FWIW
I'm sorry to hear that, and apologize if I've alienated you in any way. I'm not sure my view on MySQL foreign keys should be alienating people and I'm dumbfounded that this is the case. |
|
Apologies, my comment wasn't meant to be a personal attack or to suggest that you have personally alienated me. Your tools are awesome, and my "I get why" comment was just me acknowledging that you wrote it and thus are able to move the quickest with it, not to mention that it probably has been the most requested integration. As for FKs, I wasn't trying to say that you have alienated anyone personally with your views, just recognizing that I'm aware of them from prior posts. Given the history of gh-ost where you were working at specific companies that didn't use FKs, it makes total sense to not deal with the extra complexity that they bring to DB tooling. In Vitess by contrast, where one of the main areas of focus is full MySQL compatibility, we're trying to support the majority of workloads, many of which include FKs, so it'd be great to support them eventually, whether that is achieved via gh-ost, pt-osc, vreplication, or something else. Again I'm sorry for coming across negatively. I've personally interacted with you building the Orchestrator integration with the Vitess helm charts and have always been impressed by your knowledge and willingness to help. I was super excited when I found out you were going to Planetscale. As I mentioned originally, I love that you are doing the work to add this level of automation, taking advantage of the control plane that doesn't exist in vanilla MySQL, and will really help to drive adoption of Vitess. I look forward to continued interaction with you and want you to know that I hold you in the highest regard. |
|
😍 |
|
@derekperkins Thank you for your kind message ❤️ and I also reflect that I may take some words to present differently than intended, as I'm not a native English speaker and I can mis-parse things. I also very much enjoyed working with you in our Regarding foreign keys, there's two ways forward:
|
|
Thanks so much for this work! I am incredibly excited about the prospect of online schema change as first-class feature supported in Vitess. We have been using
Here are something of the things we are working on adding -
My only concern with the current proposal is the overhead with using the topo server for co-ordination of schema changes. |
|
@ameetkotian adressing some of the bullet points:
Yes. As mentioned above, first iteration will not support concurrent migration+reshard operation, but that should be solved in future iterations. The current PR as it is still does not address the topic of resharding. With regard to failovers, again current PR does not address it, but the idea is that in the short term we will identify a failover and restart the migration. Possibly, and only where
I see that more as an external migration tracking/management system ownership. At least for now, the purpose of the PR is to provide the mechanics for online schema changes.
Agreed
Agreed. i suspect
At this time I see this at a higher level than
I'd like to point you to this experimental PR, checksumming data on the fly. I haven't yet tested it in production.
👍 This is on my agenda. |
|
Possible syntax change:
|
|
Recent commit, c68d438, changes syntax to
and also breaks |
The WIP on VExec will mostly eliminate that. We will only write to global |
|
I have a POC for
|
|
|
|
Shlomi I am soooo excited about this!
I can provide some details around this.
|
|
The current implementation, by the way, is to run gh-ost directly on the master server via its tablet. I consider to keep it that way, and use the replicas only for throttling. Since vitess requires ROW binlog format in the first place, this should be a safe decision. As for replicas taken down for backup or for other reasons, I wish to use freno as the all-knowing throttling service, and that’s in the mid-term run. In the short term, I still need to figure it out... |
I like this syntax choice a lot, it's very readable. Are there any configuration options we might want to set in SQL? I'm not sure if it makes sense, but maybe these could be pseudo function calls,
I'm glad that there's a viable path to support them down the road. For the reasons you've laid out in other comments, I'd prefer to see support in |
Yeah, I suspect we'd need to support some config via SQL; in particular, I'm looking at what's an acceptable replication lag.
|
|
On the topic of handling failures:
|
|
@rohit-nayak-ps I've now pushed my changes to Notable changes:
|
|
re: |
|
Migration options now available: alter with_ghost table my_table ... -- no options
alter with_ghost '--max-lag-millis=1500' table my_table ...
alter with_pt '--max-lag 1.5s --null-to-not-null' table my_table ... |
|
It's now possible to
syntax subject to change:
|
|
Suggestions from Andrew Mason in Vitess slack: I'm thinking a simple solution would be to add two flags to vttablet that is like: Another thought with respect to not passing |
|
Incorporated #6815, where the throttler is disabled, by default. |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
The test The watched path is |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
Found it! |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
Upon migration completion (whether successful or failed), online-ddl executor renames away the artifacts. This uses some logic from #6719 :
|
…endtoend tests Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
|
I'm ready to have this PR merged. It now supports throttling and table lifecycle. I have not made changes to the |
|
OMG 🎉 |
|
Pointing out that the |
This checks if a vtgate is currently filtering keyspaces before requesting the TopoServer. This is necessary because a TopoServer can't be accessed in those cases as the filtered Topo in those cases could make it unsafe to make writes since all reads would be returning a subset of the actual topo data. The only use of the requested topoServer that I found was in the DDL handling path and was introduced in vitessio#6547. This is deployed on dev but should get testing (endtoend or unit, unclear on best path atm) before going upstream.
This checks if a vtgate is currently filtering keyspaces before requesting the TopoServer. This is necessary because a TopoServer can't be accessed in those cases as the filtered Topo in those cases could make it unsafe to make writes since all reads would be returning a subset of the actual topo data. The only use of the requested topoServer that I found was in the DDL handling path and was introduced in vitessio#6547. This is deployed on dev but should get testing (endtoend or unit, unclear on best path atm) before going upstream.
This checks if a vtgate is currently filtering keyspaces before requesting the TopoServer. This is necessary because a TopoServer can't be accessed in those cases as the filtered Topo in those cases could make it unsafe to make writes since all reads would be returning a subset of the actual topo data. The only use of the requested topoServer that I found was in the DDL handling path and was introduced in vitessio#6547. This is deployed on dev but should get testing (endtoend or unit, unclear on best path atm) before going upstream. # Conflicts: # go/vt/vtgate/vcursor_impl.go Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
This checks if a vtgate is currently filtering keyspaces before requesting the TopoServer. This is necessary because a TopoServer can't be accessed in those cases as the filtered Topo in those cases could make it unsafe to make writes since all reads would be returning a subset of the actual topo data. The only use of the requested topoServer that I found was in the DDL handling path and was introduced in vitessio#6547. This is deployed on dev but should get testing (endtoend or unit, unclear on best path atm) before going upstream. # Conflicts: # go/vt/vtgate/vcursor_impl.go Signed-off-by: Richard Bailey <rbailey@slack-corp.com>
This PR (work in progress) introduces zero dependency online schema changes with
gh-ost/pt-online-schema-change.UPDATE: this comment edited to reflect support for
pt-online-schema-change. Originally this PR only supportedgh-ost. Mostly whenever you seegh-ost, considerpt-online-schema-changeto apply, as well.TL;DR
User will issue:
or
$ vtctl -topo_implementation etcd2 -topo_global_server_address localhost:2379 -topo_global_root /vitess/global \ ApplySchema -sql "alter with 'gh-ost' table example modify id bigint unsigned not null" commerce $ vtctl -topo_implementation etcd2 -topo_global_server_address localhost:2379 -topo_global_root /vitess/global \ ApplySchema -sql "alter with 'pt-osc' table example modify id bigint unsigned not null" commerceand vitess will schedule an online schema change operation to run on all relevant shards, then proceed to apply the change via
gh-oston all shards.While this PR is WIP, this flow works. More breakdown to follow, indicating what's been done and what's still missing.
The ALTER TABLE problem
First, to iterate the problem: schema changes have always been a problem with MySQL; a straight
ALTERis a blocking operation; aONLINE ALTERis only "online" on the master/primary, but is effectively blocking on replicas. Online schema change tools likept-online-schema-changeandgh-ostovercome these limitations by emulating anALTERon a "ghost" table, which is populated from the original table, then swapped in its space.For disclosure, I authored
gh-ost's code as part of the database infrastructure team at GitHub.Traditionally, online schema changes are considered to be "risky". Trigger based migrations add significant load onto the master server, and their cut-over phase is known to be a dangerous point.
gh-ostwas created at GitHub to address these concerns, and successfully eliminated concerns for operational risks: withgh-ostthe load on the master is low, and well controlled, and the cut-over phase is known to cause no locking issues.gh-ostcomes with different risks: it applies data changes programmatically, thus the issue of data integrity is of utmost importance. Another note of concern is data traffic: going out from MySQL intogh-ostand back into MySQL (as opposed to all-in MySQL inpt-online-schema-change).This way or the other, running an online schema change is typically a manual operation. A human being will schedule the migration, kick it running, monitor it, possibly cut-over. In a sharded environment, a developer's request to
ALTER TABLEexplodes tondifferent migrations, each needs to be scheduled, kicked, monitored & tracked.Sharded environments are obviously common for
vitessusers and so these users feel the pain more than others.Schema migration cycle & steps
Schema management is a process that begins with the user designing a schema change, and ends with the schema being applied in production. This is a breakdown of schema management steps as I know them:
ALTER TABLEorpt-online-schema-changeorgh-ostcommand)What we propose to address
Vitess's architecture uniquely positions it to be able to automate away much of the process. Specifically:
ALTER TABLEstatement into agh-ostinvocation is super useful if done by vitess, since vitess can not only validate schema/params, but also can provide credentials, identify a throttle-control replica, can instructgh-oston how to communicate progress via hooks, etc.vitessjust knows where the table is located. It knows if the schema is sharded. It knows who the shards are, who the shards masters are. It knows where to rungh-ost. Last,vitesscan tell us which replicas we can use for throttling.vttabletis the ideal entity to run a migration; can read instructions fromtoposerver and can write progress totoposerver.vitessis aware of possible master failovers and can request a re-execute is a migration is so interrupted mid process.vtctldAPI can offer endpoints to track status of a migration (e.g. "in progress on-80, in queue on80-"). It may offer progress pct and ETA.gh-ost, the cut-over phase is safe to automate away.vttabletis in an excellent position to automate that away.What this PR does, and what we expect to achieve
The guideline for this PR is: zero added dependencies; everything must be automatically and implicitly available via a normal
vitessinstallation.A breakdown:
User facing
This PR enables the user to run an online schema migration (aka online DDL) via:
vtgate: the user connects tovitesswith their standard MySQL client, and issues aALTER WITH 'gh-ost' TABLE ...statement. Notice this isn't a valid MySQL syntax -- it's a hint forvitessthat we want to run this migration online.vitessstill supports synchronous, "normal"ALTER TABLEstatements, which IMO should be discouraged.vtctl: the user runsvtctl ApplySchema -sql "alter with _gh-ost' table ...".The response, in both cases, is a migration ID, or a job ID, if you will. Consider the following examples.
via
vtgate:via
vtctl:In both cases, a UUID is returned, which can be used for tracking (WIP) the progress of the migration across shards.
Parser
Vitess' parser now accepts
ALTER WITH 'gh-ost' TABLEandALTER WITH 'pt-osc' TABLEsyntax. We're still to determine if this is the exact syntax we want to go with.Topo
Whether submitted by
vtgateorvtctl, we don't immediately run the migration. As mentioned before, we may wish to postpone the migration. Perhaps the relevant servers are already running a migration.Instead, we write the migration request into global
topo, e.g.:/vitess/global/schema-migration/requests/90c5afd4-da38-11ea-a3ff-f875a4d24e90{"keyspace":"commerce","table":"example","sql":"alter table example modify id bigint not null","uuid":"90c5afd4-da38-11ea-a3ff-f875a4d24e90","online":true,"time_created":1596701930662801294,"status":"requested"}Once we create the request in
topo, we immediately return the generated UUID/migration ID (90c5afd4-da38-11ea-a3ff-f875a4d24e90in the above example) to the user.vtctld
vtctldgets a conceptual "upgrade" with this PR. It is no longer a reactive service.vtctldnow actively monitors newschema-migration/requestsintopo.When it sees such a request, it evaluates what are the relevantnshards.With current implementaiton, it writesn"job" entries, one per shard. e.g./vitess/global/schema-migration/jobs/commerce/-80/ce45b84a-da2d-11ea-b490-f875a4d24e90and/vitess/global/schema-migration/jobs/commerce/80-/ce45b84a-da2d-11ea-b490-f875a4d24e90for a keyspace with two shards; or just/vitess/global/schema-migration/jobs/commerce/0/1dd17132-da23-11ea-a3d2-f875a4d24e90for a keyspace with one shard.DONE: WIP: we will investigate use of new
VExecto actually distribute the jobs tovttablet.what
vtctlddoes now, is, once it sees a migration request, it pushes a VExec request for that migration. If the VExec request succeeds, that means all shards have been notified, andvtctldcan stow away the migration request (work is complete as far asvtctldis concerned). If VExec returns with an error, that means at least one shard did not get the request, andvtctldwill keep retrying pushing this request.vttablet
This is where most of the action takes place.
vttabletruns a migration service which continuously probes for, schedules, and executes migrations.DONE:
With current implementation, tablets which havetablet_type=MASTERcontinuously probe for new entries. We look to replace this withVExec.migration requests are pushed via
VExec; the request includes theINSERT IGNOREquery that persists the migration in_vt.schema_migrations. The tablet no longer reads from, nor writes to, Global Topo.A new table is introduced:
_vt.schema_migrations, which is howvttabletmanages and tracks its own migrations.vttabletwill only run a single migration at a time.vttabletwill see if there's an unhandled migration requests. It will queue it.vttabletwill make a migrationreadyif there's no running migration and no other migration is marked asready.vttabletwill run areadymigration. This is really the interesting part, with lots of goodies:vttabletwill evaluate thegh-ost ...command to run. It will obviously populate--alter=... --database=....vttabletcreates a temp directory where it generates a script to rungh-ost.vttabletcreates a hooks path and auto-generates hook files. The hooks will interact withvttabletvttablethas an API endpoint by which the hooks can communicategh-ost's status (started/running/success/failure) withvttablet.vttabletprovidesgh-ostwith--hooks-hintwhich is the migration's UUID.vttabletautomatically generates agh-ostuser on the MySQL server, with a random password. The password is never persisted and does not appear onps. It is written to, and loaded from, an environment variable.vttabletgrants the properprivilegeson the newly created accountvttabletwill destroy the account once migration completes.vitessrepo includes agh-ostbinary. We requiregh-ostfromopenark/gh-ostas opposed togithub/gh-ostbecause we've had to make some special adjustments togh-osts oas to support this flow. I do not have direct ownership togithub/gh-ostand cannot enforce those changes upstream, though I have made the contribution requestss upstream.make buildautomatically appendsgh-ostbinary, compressed, tovttabletbinary, via Ricebox.vttablet, upon startup, auto extractsgh-ostbinary into/tmp/vt-gh-ost. Please note that the user does not need to install gh-ost.vttabletto report back the job as complete/failed. We look to useVExec. TBD.Tracking breakdown
OnlineDDLstruct, defines a migration request and its statusALTER WITH 'gh-ost' TABLEandALTER WITH 'pt-osc' TABLEsyntaxtopo)vtctlto skip "big changes" check when-online_schema_changeis giventablet_executorto submit an online DDL request totopoas opposed to running it on tabletsvtctldruns a daemon to monitor for, and review migration requestsvtctldevaluates which shards are affected_vt.schema_migrationsbackend table to support migration automation (on each shard))vttabletvalidates MySQL connection and variablesvttabletcreates migration commandvttabletcreates hooksvttabletprovides HTTP API for hooks to report their status backvttabletcreatesgh-ostuser with random passwordvttabletdestroysgh-ostuser upon completiongh-ostembedded invttabletbinary and auto-extracted byvttabletvttabletruns adry-runexecutionvttabletruns a--execute(actual) executionvttabletsupports aCancelrequest (not used yet) to abort migrationvttabletas a state machine to work throught the migration stepsgh-ostmigration requests, suceessful and failed migrationsVExecto apply migrations onto tabletsVExecto control migrations (abort, retry)vttabletto heuristically check for available disk spacegh-ostlogs if necessaryALTER WITH 'gh-ost' TABLE...andALTER WITH 'pt-osc' TABLEsyntax make sense? Other?throttle by replicawait for replica to catch up with new credentials before starting the migrationpt-online-schema-changebundled insidevttabletbinarypt-online-schema-changedefine foreign key flags for- user can define as runtime flagspt-online-schema-changeexecutionvttabletitself crashespt-online-schema-changepasswords are in cleartext. Can we avoid that?vtctl ApplySchemause sameWITH 'gh-ost'andWITH 'pt-osc'query hints as invtgate.gh-ostandpt-online-schema-changepathspt-osctriggers after migration failurept-osctriggers on migration cancellation (overlaps with previous bullet, but has stronger guarantee)pt-osctriggers from stale/zombiept-oscmigrationvtctl OnlineDDLcommand for simple visibility and manipulation. See Experimental: automated, scheduled, dependency free online DDL via gh-ost/pt-online-schema-change #6547 (comment)artifactscolumn, suggesting which tables need to be cleaned up after migrationQuite likely more entries to be added.
Further reading, resources, acknowledgements
We're obviously using gh-ost. I use my own
openark/gh-ostsince I have no ownership of the original https://github.com/github/gh-ost.gh-ostwas/is developed by GitHub 2016-2020.pt-online-schema-changeis part of the popular Percona ToolkitThe schema migratoin scheduling and tracking work is based on my previous work at GitHub. The implementation in this PR is new and rewritten, but based on concepts that have matured on my work on
skeefree. Consider these resources:Also:
Initial incarnation of this PR: planetscale#67; some useful comments on that PR.
Call for feedback
We're looking for community's feedback on the above suggestions/flow. Thank you for taking the time to read and respond!